Comparing Multiple Approaches to the Cross-Lingual 5W Task

نویسندگان

  • Kristen Parton
  • Kathleen R. McKeown
  • Bob Coyne
  • Mona T. Diab
  • Ralph Grishman
  • Dilek Hakkani-Tür
  • Mary Harper
  • Heng Ji
  • Wei Yun Ma
  • Adam Meyers
  • Sara Stolbach
  • Ang Sun
  • Gokhan Tur
  • Wei Xu
  • Sibel Yaman
چکیده

Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor targetlanguage analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task

Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems...

متن کامل

Overview of the NTCIR-9 Crosslink Task: Cross-lingual Link Discovery

This paper presents an overview of NTCIR-9 Cross-lingual Link Discovery (Crosslink) task. The overview includes: the motivation of cross-lingual link discovery; the Crosslink task definition; the run submission specification; the assessment and evaluation framework; the evaluation metrics; and the evaluation results of submitted runs. Cross-lingual link discovery (CLLD) is a way of automaticall...

متن کامل

Automated Cross-lingual Link Discovery in Wikipedia

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-l...

متن کامل

A Survey on Multi-Document Summarization

Multi-document summarization aims at delivering the majority of information content from multiple documents using much less lengthy texts, usually a short paragraph of several hundred words. This paper surveys several different approaches to multi-document summarization by first building a unified high level view of the multi-document summarization problem, and then comparing different approach...

متن کامل

UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-Occurrence Graphs

We describe the University of Heidelberg (UHD) system for the Cross-Lingual Word Sense Disambiguation SemEval-2010 task (CL-WSD). The system performs CLWSD by applying graph algorithms previously developed for monolingual Word Sense Disambiguation to multilingual cooccurrence graphs. UHD has participated in the BEST and out-of-five (OOF) evaluations and ranked among the most competitive systems...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009